Goto

Collaborating Authors

 disjoint subset



A Preliminaries

Neural Information Processing Systems

This section contains some standard linear-algebra results used in the proofs. Return; {Comment: "if" condition ensures max Subsection C.1 shows that w.h.p. C.1 Curated SVD achieves Objective (ii) We first show that both the objectives can be achieved simultaneously. Linear algebra results used in the paper. From Markov's inequality and the previous lemma, we get Pr[ w Therefore, w.p. 4/5, Curated SVD finds I zr such that, w The rest of this subsection proves the lemma.


Classification with Nearest Disjoint Centroids

Fraiman, Nicolas, Li, Zichao

arXiv.org Machine Learning

In this paper, we develop a new classification method based on nearest centroid, and it is called the nearest disjoint centroid classifier. Our method differs from the nearest centroid classifier in the following two aspects: (1) the centroids are defined based on disjoint subsets of features instead of all the features, and (2) the distance is induced by the dimensionality-normalized norm instead of the Euclidean norm. We provide a few theoretical results regarding our method. In addition, we propose a simple algorithm based on adapted k-means clustering that can find the disjoint subsets of features used in our method, and extend the algorithm to perform feature selection. We evaluate and compare the performance of our method to other closely related classifiers on both simulated data and real-world gene expression datasets. The results demonstrate that our method is able to outperform other competing classifiers by having smaller misclassification rates and/or using fewer features in various settings and situations.


Graphical structure of conditional independencies in determinantal point processes

Tadić, Tvrtko

arXiv.org Machine Learning

Determinantal point process have recently been used as models in machine learning and this has raised questions regarding the characterizations of conditional independence. In this paper we investigate characterizations of conditional independence. We describe some conditional independencies through the conditions on the kernel of a determinantal point process, and show many can be obtained using the graph induced by a kernel of the L-ensemble. In recent years there have been several machine learning papers about the applications of determinantal point processes (DPP's) [4], [7], [8], [9]... An overview of theory, recent applications and problems in learning DPP's is given in a recent extensive survey [6] by Kulesza and Taskar. In a private communication with Ben Taskar, one of the questions from survey [6] (see §7.3), that remains for future research, was brought to author's attention: - Is there a simple characterization of the conditional independence relations encoded by a DPP? This question arises naturally having in mind conditional independence structure models (see [12]), such as graphical models (see [11]) that are often used. It turns out that, from the mathematical view point, elegant characterizations, similar to those in graphical models, exist. This paper provides two (main) characterizations: - the block in a Schur complement of the kernel has to be a 0-block (Theorem 16, Proposition 17); - we can use the structure of the graph induced by the kernel of the L-ensemble to read many conditional independencies in the process (Theorem 28, Proposition 30).


Anytime Hierarchical Clustering

Arslan, Omur, Koditschek, Daniel E.

arXiv.org Machine Learning

We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed /parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.


Conditional Independence in Uncertainty Theories

Shenoy, Prakash P.

arXiv.org Artificial Intelligence

This paper introduces the notions of independence and conditional independence in valuation-based systems (VBS). VBS is an axiomatic framework capable of representing many different uncertainty calculi. We define independence and conditional independence in terms of factorization of the joint valuation. The definitions of independence and conditional independence in VBS generalize the corresponding definitions in probability theory. Our definitions apply not only to probability theory, but also to Dempster-Shafer's belief-function theory, Spohn's epistemic-belief theory, and Zadeh's possibility theory. In fact, they apply to any uncertainty calculi that fit in the framework of valuation-based systems.


Causal models have no complete axiomatic characterization

Li, Sanjiang

arXiv.org Artificial Intelligence

Markov networks and Bayesian networks are effective graphic representations of the dependencies embedded in probabilistic models. It is well known that independencies captured by Markov networks (called graph-isomorphs) have a finite axiomatic characterization. This paper, however, shows that independencies captured by Bayesian networks (called causal models) have no axiomatization by using even countably many Horn or disjunctive clauses. This is because a sub-independency model of a causal model may be not causal, while graph-isomorphs are closed under sub-models.